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/. In Vitro Evolution of Chemicals 

William Paley wrote the following 57 years before 
the Origin of Species: 

In crossing a heath, suppose.,.1 had found a 
watch upon the ground, and it should be in- 
quired how the watch happened to be in that 
place.... [T] he inference, we think, is inevitable, 
that the watch must have had a maker: that 
there must have existed.. .an artificer or artifi- 
cers, who formed it for the purpose which we 
find it actually to answer; who comprehended 
its construction and designed its use... [And] 
every indication of contrivance, every manifes- 
tation of design, which existed in the watch, 
exists in the works of nature; with the differ- 
ence, on the side of nature, of being greater or 
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more, and that in a degree which exceeds all 
computation." 1 

But ever since Darwin we have come to understand 
that the exquisite "watches" of the living world are 
fashioned by an altogether different process. As 
Richard Dawkins writes in his compelling book on 
evolution, natural selection "does not plan for the 
future. It has no vision, no foresight, no sight at all. 
If it can be said to play the role of watchmaker in 
nature, it is the blind watchmaker." 2 

Imagine, then, the applied chemist, not as designer 
of molecules with a particular purpose, but rather 
as custodian of a highly diverse population of chemi- 
cals evolving in vitro as if they were organisms 
subject to natural selection. A chemical's "fitness" 
in this artificial biosphere would be imposed by the 
custodian for his or her own ends. For instance, the 
population might be culled periodically of individuals 
who fail to bind tightly to some biological receptor; 
the population would then evolve toward specific 
ligands for that receptor. (In this review, we will use 
"receptor" as a generic term for a biomolecule that 
specifically binds a natural ligand. This definition 
encompasses enzymes, which bind their substrates; 
hormone receptors, which bind their hormones; an- 
tibodies, which bind their antigens; and other ex- 
amples.) Progress toward the custodian's chosen goal 
would in a sense be "automatic": once appropriate 
selection conditions are devised, no plan for how the 
system is to meet the demands of selection need be 
specified. And if the chemical population is suf- 
ficiently diverse, perhaps this "blind" process will 
outperform rational design. The custodian may not 
comprehend, even in retrospect, how the products of 
selection work, just as biologists have only the 
sketchiest understanding of how a fruitfly functions. 

The key characteristics of evolving organisms are 
replicability (i.e., ability to make copies of themselves) 
and mutability (i.e., ability to undergo changes that 
are passed on to their progeny) . How can a chemical 
"replicate" or "mutate"? Actually, the living world 
abounds in just such evolving chemicals. Take a 
protein as an example: it cannot replicate or mutate 
directly, of course; but it is associated with a cell or 
multicellular organism that can. Linkage of the 
protein with the genetic machinery that encodes it 
thus confers on it the key properties of replicability 
and mutability. 

Phage display, the subject of this review, is a 
practical realization of the artificial chemical evolu- 
tion envisioned above. Using standard recombinant 
DNA technology, peptides (or proteins; we shall often 
use the term "peptide" to refer to an amino acid chain 
regardless of its length) are associated with replicat- 
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ing viral DNAs that include the peptides' coding 
sequences. The peptide populations so created are 
managed by simple microbiological methods. Phage 
display is an exponentially growing research area, 
and numerous reviews covering different aspects of 
it have been published in recent years. 3-18 

This review is addressed primarily to chemists, but 
it does assume a rudimentary knowledge of molecular 
biology, including replication of DNA, expression of 



genes (transcription of DNA into mRNA starting at 
a promoter, and translation of mRNA into protein 
using the genetic code), and use of recombinant DNA 
vectors to clone foreign DNA inserts. 

//. Phage-Display Libraries as Populations of 
Replicable, Mutable Chemicals 

A. Phage-Display Vectors 

Phages are viruses that infect bacterial cells, and 
many of the vectors used in recombinant DNA 
research are phages that infect the standard recom- 
binant DNA host: the bacterium Escherichia coli. 
The key feature of recombinant DNA vectors, includ- 
ing phages, is that they accommodate segments of 
"foreign" DNA— pieces of human DNA, for instance, 
or even stretches of chemically synthesized DNA. As 
vector DNA replicates in its E. coli host, then, the 
foreign "insert" replicates along with it as a sort of 
passenger. 

An "expression vector," including a phage-display 
vector, has an additional feature compared to vectors 
in general: the foreign DNA is "expressed" as a 
protein. That is, it programs machinery of the E. coli 
host cell to synthesize a foreign peptide whose amino 
acid sequence is determined [via the genetic code) by 
the nucleotide sequence of the insert. Phage display 
differs from conventional expression systems, how- 
ever, in that the foreign gene sequence is spliced into 
the gene for one of the phage coat proteins, so that 
the foreign amino acid sequence is genetically fused 
to the endogenous amino acids of the coat protein to 
make a hybrid "fusion" protein. The hybrid coat 
protein is incorporated into phage particles ("virions") 
as they are released from the cell, so that the foreign 
peptide or protein domain is displayed on the outer 
surface. 

A phage-display "library" is a heterogeneous mix- 
ture of such phage clones, each carrying a different 
foreign DNA insert and therefore displaying a dif- 
ferent peptide on its surface. Different types of 
libraries will be discussed below. Each peptide in the 
library can replicate, since when the phage to which 
it is attached infects a fresh bacterial host cell, it 
multiplies to produce a huge crop of identical progeny 
phages displaying the same peptide. And if the 
phage's DNA suffers a mutation in the peptide coding 
sequence, that mutation is passed on to the phage's 
progeny and can affect the structure of the peptide. 
In short, the peptides in a phage-display library have 
the two key characteristics required for chemical 
evolution: replicability and mutability. 

Because of its accessibility to solvent, a displayed 
peptide often behaves essentially as it would if it were 
not attached to the virion surface. Thus, for example, 
peptides that are ligands for receptors typically retain 
their affinity and specificity when displayed in this 
way on the virion surface. This means that many 
techniques that chemists or biochemists apply to 
compounds free in solution can be applied more or 
less unaltered to peptides tethered to a phage. In 
particular, affinity purification, in which an im- 
mobilized receptor is used to specifically "capture" 
ligands from a complex mixture of compounds, can 
equally be used to capture phage displaying receptor- 



Phage Display 

binding peptides from a large phage library display- 
ing many different peptide structures. The captured 
phages are "amplified" by infecting them en masse 
into fresh cells and culturing the cells to yield a large 
crop of progeny phages, which can serve as the input 
for another round of affinity purification. Moreover, 
by periodically introducing mutations into the phage 
population, the experimenter widens the search for 
effective ligands by exploring peptide sequences that 
are not present in the initial phage-display library 
(section V) . Eventually, captured phages are cloned 
so that the displayed peptides responsible for binding 
can be studied individually. The amino acid se- 
quence of the peptide is easily obtained by determin- 
ing the corresponding coding sequence in the viral 
DNA. This so-called "affinity selection" is the pre- 
mier example of artificial selection imposed on popu- 
lations of phage-displayed peptides (section IV.B). 
Since there is no need to process clones one by one 
until the final stage, enormous libraries displaying 
billions of different structures can be easily surveyed 
for exceedingly rare binding clones. 

B. How Foreign Peptides Are Displayed on 
Filamentous Phages 

Most phage-display work— and all the work to be 
reviewed here— has used filamentous phage strains 
M13, fd, and fl as the vectors; display systems based 
on bacteriophage T4 19 ' 20 and A 21 are extremely prom- 
ising, but will not be reviewed here. Filamentous 
phages are flexible rods about 1 fim long and 6 nm 
in diameter, composed mainly (87% by mass) of a 
tube of helically arranged molecules of the 50-residue 
major coat protein pVIII 22 ; there are 2700 copies in 
wild-type virions, encoded by a single phage gene 
VIII. Inside this tube lies the single-stranded viral 
DNA (ssDNA; 6407-8 nucleotides in wild-type 
strains). At one tip of the particle there are five 
copies each of the minor coat proteins pill and pVI 
(genes III and VI, respectively) ; minor coat proteins 
pVII and pIX (genes VII and IX) are at the other tip. 
The phages infect strains of E. coli that display a 
threadlike appendage called the F pilus. Infection 
is initiated by attachment of the N-terminal domain 
of pill (about 200 amino acids) to the tip of the pilus; 
this is the end of the particle that enters the cell first. 
As the process continues, the coat proteins dissolve 
into the surface envelope of the cell and the uncoated 
ssDNA concomitantly enters the cytoplasm. There, 
a complementary DNA strand is synthesized by host 
machinery, resulting in a double-stranded replicative 
form (RF). The RF replicates to make progeny RFs 
and is also the template for transcription of phage 
genes and synthesis of progeny ssDNAs. These 
progeny ssDNAs are extruded through the cell en- 
velope, acquiring the coat proteins from the mem- 
brane and emerging as completed virions (several 
hundred per cell per division cycle). Progeny virions 
are secreted continuously without killing the host; 
chronically infected cells continue to divide, though 
at a slower rate than uninfected cells. The yield of 
virions can exceed 0.3 mg/mL. 

Foreign peptides have been fused to three coat 
proteins: pill, pVIII, and pVI. The first two of these 
are synthesized with N-terminal signal peptides. 
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Figure 1. Schematic diagram of how foreign peptide 
domains are fused to coat proteins pill, pVII, and pVI in 
phage-display vectors. In each diagram, the black segment 
represents the foreign peptide. 

which are cleaved from the polypeptide chain as it is 
inserted in the inner membrane of the cell (the 
bacterial envelope has inner and outer membranes 
separated by the periplasm). A single segment of 
amino acids in pill and pVIII spans the inner 
membrane, separating a periplasmic N-terminal seg- 
ment from a short cytoplasmic C-terminal segment; 
it is from this state that the proteins are incorporated 
into virions. 

Figure 1 diagrams the ways that foreign peptides 
have been fused to these proteins. Until recently, 
foreign peptides have been fused to regions of pVIII 
and pill that were known to be exposed to the 
exterior: the N-terminus of pVIII 23 and the N- 
terminus and middle of pill. 24 - 25 In some pill vectors, 
the foreign peptide replaces the N-terminal domain 
of pill (the third diagram in Figure 1), yielding a 
hybrid protein that can be incorporated into the 
virion but must be supplemented by complete pill 
molecules if the virion is to be infective (see type 3+3 
systems in the next subsection) ; infective virions in 
this case are thus mosaics with two types of pill 
molecule. Similarly, when pVIII displays a relatively 
large foreign peptide (more than about eight amino 
acids) , it will not support phage production unless it 
is supplemented by wild-type pVIII molecules, again 
yielding mosaic particles. 26 " 29 

In pill and pVIII fusions, the foreign peptide must 
be spliced somewhere between the signal peptide and 
the portion of the coat protein that is required for 
incorporation into the virion. This means that the 
reading frame of the foreign DNA insert must be 
fused correctly to the reading frame of the coat 
protein at both vector-insert and insert -vector 
junctions (corresponding to the left and right ends 
of the black foreign peptide in Figure 1). More 
recently, however, foreign peptides have been fused 
to the C-terminus of pVI 30 , as shown in the sixth 
diagram in Figure 1; in this case, the two reading 
frames need only be fused correctly at the vector- 
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Figure 2. Types of phage display systems. Type 6, 66, and 
6+6 systems are not shown. The long vertical ovals 
represent phage virions, and the shorter vertical ovals 
represent phagemid virions. The twisted line inside each 
virion represents the single-stranded viral DNA, the seg- 
ments encoding coat proteins pVIII and pill being desig- 
nated by black and white boxes, respectively. The hatched 
segments within these boxes represent foreign coding 
sequences spliced into a coat-protein gene, and the hatched 
circles on the surface of the virions represent the foreign 
peptides specified by these foreign coding sequences. The 
five white circles at one tip of the virions represent the 
N-terminal domains of the five pill molecules; foreign 
peptides displayed on pill are either appended to the 
N-terminal domain (type 3 systems) or replace the N- 
terminal domain (type 3+3 and most type 33 systems). In 
type 8 systems, the foreign peptide is displayed on all copies 
of the major coat protein pVIII (2700 copies in wild-type 
virions), whereas in type 88 and 8+8 systems, only a 
minority of the pVIII copies display the foreign peptide. 
(Reproduced from ref 33. Copyright 1993 Elsevier.) 

insert junction (corresponding to the left end of the 
black segment in Figure 1). C-Terminally fused 
peptides have also been displayed indirectly on pill 
via a leucine zipper "fastener", 31 as shown in the fifth 
diagram. Here, both the foreign peptide and pill are 
preceeded by signal peptides; after the signal peptide 
is removed, the two "half-zippers" join together in the 
periplasm or prior to, or concomitantly with, incor- 
poration into the secreted virion. 

C. Types of Phage-Display Systems 

Phage-display systems can be classified according 
to the arrangement of the coat protein genes. 32 33 This 
is illustrated for fusions to pVIII and pill in Figure 
2, in which gene VIII is represented as a black block, 
gene III as a white block, the foreign DNA insert as 
a cross-hatched block, and the foreign peptide as a 
cross-hatched circle. In a "type 3" vector, there is a 
single phage chromosome (genome) bearing a single 
gene III which accepts foreign DNA inserts and 
encodes a single type of pill molecule. The foreign 
peptide encoded by the insert is theoretically dis- 
played on all five pill molecules on a virion (though 
in practice normal proteolytic enzymes in the host 
bacterium often remove the foreign peptide from 
some or even most copies of pill, especially if the 
foreign peptide is large). Similarly, type 8 (see Figure 



2) and type 6 vectors (not shown) display foreign 
peptides on every copy of pVIII and pVI, respectively 
(no type 6 vectors have been reported) . As mentioned 
above, only short foreign peptides can be displayed 
on every copy of pVIII; even so, the peptide comprises 
a substantial fraction of the virion's mass and can 
dramatically alter its physical and biological proper- 
ties. 29 ' 34 ' 35 

In a type 88 vector, the phage genome bears two 
genes VIII, encoding two different types of pVIII 
molecule; one is ordinarily recombinant (i.e., bears a 
foreign DNA insert) and the other wild-type. The 
resulting virion is a mosaic, its coat comprised of both 
wild-type and recombinant pVIII molecules (the 
former usually predominating). This allows hybrid 
pVIII proteins with quite large foreign peptides to 
be displayed on the virion surface, even though the 
hybrid protein by itself cannot support phage as- 
sembly. Similarly, a type 33 vector bears two genes 
III, one of which is recombinant. 

A type 8+8 system differs from a type 88 system 
in that the two genes VIII are on separate genomes. 
The wild-type version is on a phage (usually called 
the "helper" phage), while the recombinant version 
is on a special kind of plasmid called a "phagemid." 36 ' 37 
Like other plasmids used in recombinant DNA re- 
search, a phagemid carries a plasmid replication 
origin that allows it to replicate normally in an E. 
coli host and an antibiotic resistance gene that allows 
plasmid-bearing host cells to be selected. But it also 
carries a filamentous phage replication origin, which 
is inactive until the cell is infected with the helper 
phage. Then the phage replication protein acts not 
only on the phage origin on the helper phage DNA 
but also on the phage origin on the phagemid DNA. 
Two types of progeny virions are thus secreted: 
particles carrying helper phage DNA and particles 
carrying phagemid DNA. Both these virions, like the 
type 88 virions, are mosaics, whose coats are com- 
posed of a mixture of recombinant and wild-type 
pVIII molecules. When a phagemid virion infects a 
cell, the cell acquires the antibiotic resistance carried 
by the phagemid. When a helper phage virion infects 
a cell, the cell goes on to produce progeny helper 
virions in the normal way; the progeny virions, unlike 
the original infecting virion, are not mosaic, since the 
helper carries only a single gene VIII. Type 3+3 and 
6+6 systems are like type 8+8 systems, except that 
the phagemid carries an insert-bearing recombinant 
gene III or VI, respectively, rather than VIII. The 
recombinant pill encoded by a type 3+3 phagemid 
is usually missing the N-terminal domain (as in the 
third diagram in Figure 1) , since cells expressing this 
domain are resistant to superinfection by helper 
phage. 

Most phage display vectors are designed to be 
introduced into E. coli cells as naked DNA by 
electroporation, 38 which is particularly well-suited to 
making very large libraries. Special display vectors 
that can be packaged in vitro into phage X particles 
have also been reported. 39,40 

///. Types of Displayed Peptides and Proteins 

The most common type of phage-display constructs 
are "random" peptide libraries, an outgrowth of the 



Table 1. Random Peptide Libraries 



random peptide 


type 


N-terminal sequence' 


no. 0 c ones 


re s 


6-mer 


3 


ADGAX 6 GAAG-AETVE 


2 x 10 s 


142 


15-mer 


3 


AEX, 5 PPPPPP-AETVE 


2 x 10 7 


161 


6-mer 


3 


XcGG-TVE 


3 x 10 8 


141 


9-mer 


8+8 


AEG-EFXg-DPAK 


4 x 10 7 


26 


10-mer 


3 


AD VAX i o AAS G- AETVE 


4 x 10 s 


162 


6-mer 


3 


X 6 GGG-AETVE 




61 


6-mer 


3 


AE-CXgC GG-TVE 


nr 


61 


9-mer 


3 


AE-LGGGGXgGGGGVP- 


2.4 x 10 7 


163 


15-mer 


3 


ADGAX15GAAG-AETVE 


4 x 10 7 


147 


6-mer 


3 


A-EGXCX 4 CXSYIEGRIV-ETVE 


8.6 x 10 G 


164 


9-mer 


8+8 


AEG-EFCX9CG-DPAK 


2.5 x 10 7 


105 


36-mer 


3 


S(S/R)X 18 PGX, a SRPAR-TVE 


2 x 10 8 


165 


8-mer 


3 


X a ASGSA- 


1.4 x 10 9 


62 


12-rner 


3 


X, 2 ASGSA- 


5 x 10 8 


62 


5-mer 


3+3 


GPGGX5GGPG- 


5 x 10 6 


86 


5-mer 


3+3 


GPAAX5AAPG- 


2 x 10 6 


86 


20-mer 


3 


ADGAX20GAAG-AETVE 


1.5 x 10 8 


166 


10-mer 


3 


ADASSGAX10SALSGSG-AETVE 


2 x 10 6 


167, 168 


15-mer 


3 


nr 


5 x 10 7 


104 


7-mer 


3 


ADGACX7CGAAG-AETVE 


4.5 x 10 9 


109 


6-mer 


3 


X 6 PNDKYEPFPPPPAA-AE 


1 x 10 7 


53 


6-mer 


3 


AE-GX 6 G-TVE 


2.5 x 10 9 


137 


6-mer 


3 


AE-XsPPIPG-TVE 


2.0 x 10 9 


137 


6-mer 


3 


AE-RSLRPLXeG-TVE 


5.8 x 10 s 


137 


6-mer 


3 


AE-PPPYPPXg-TVE 


3.1 x 10 s 


137 


6-mer 


3 


YGGFLGACLEPYTACDSSGGSGXb" 


2 x 10 8 


88 


5-mer 


3 


AE-X5RPLPPLPPP-TVE 


7.5 x 10 7 


100 


5-mer 


3 


AE-RSLRPLPPLPX 5 -TVE 


5.4 x 10 7 


100 


5-mer 


3 


AE-GAAPPLPPRX5-TVE 


2.2 x 10 7 


100 


5-mer 


8+8 


AEG-DDPYKCPECGKSFSQKX2LX2HQXTHTG-DDPA 


9.7 x 10 6 


115 


6-mer 


3 




8.6 x 10 6 


110 


6-mer 


3 


nr 


8.6 x 10 s 


169 


15-mer 


3 




5.7 x 10 8 


169 


9-mer 






1 x 10° 


59 


4-mer/Cys 


3 


AE-CX4CIEGRGG- 


3.8 x 10 8 


170 


5-mer/Cys 


3 


AE-CX5CIEGRGG- 


2.4 x 10 8 


170 


6-mer/Cys 




AE-CXeCIEGRGG- 


6.1 x 10 8 


170 


10-mer 


3 


X10GG-TVE 


2 x 10 s 


111 


1 8-mer 


3 


X9GAX9GAAGGAGAGAG-TVE 


4 x 10 s 


111 


8-12-mer/Cys 


3 


X2CX4-8CX2GAAGGAGAGAG-TVE 


3 x 10 8 


111 


30-mer 


nr 


nr 


1 x 10 9 


127 


20-mer 


88 


AX10HX10GGSE-AEGD 


1 x 10 9 


45 


(3+6)-mer 


33 


(l-37)-X 3 -(41-59)-X 6 -(66-74)- 6 
(l-18)-X 5 -(24-80)-X 4 -(85-106)- i 


1 x 10 8 


116 


(5+4)-mer 


3 


2 x 10 8 


114 


6-mer 


3+3 


AAQP AMA- (1-7) -X3FX3- (1 5 - 6 1) 4 


8 x 10 7 


99 


35-mer 


3 


S(S/R)X 20 (Y/H/N/D)A(I/M/T/N/K/S/R) 
X, 5 SRIEGRARPSR-* 


5 x 10 8 












40-mer 


3 


S (S/RJXzoGCGXzoSRIEGRARPSR 4 


1 x 10 8 


57 




88 


X 6 A-AEGD 


8 x 10 s 


171 


15-mer 


88 


XisA-AEGD 


1.3 x 10 9 


171 


30-mer 


88 


X30A-AEGD 


2.5 x 10 s 


171 


1 6-mer 


88 


X 8 CX 8 A-AEGD 


2.5 x 10 8 


171 


1 6-mer 


88 


X 15 CNA-AEGD 


1.2 x 10 8 


171 


1 6-mer 




XCX 15 A-AEGD 






6-mer 
8-mer 


88 


XCXjCXA-AEGD 
XCXeCXGGP-AEGD 


2.2 x 10 s 
1 x 10 10 


171 
171 


8-mer 


88 


XCX 0 CXA-AEGD 


1.5 x 10 8 


171 


10-mer 


88 


XCX 8 CXA-AEGD 


1.5 x 10 9 


171 


9-mer 


88 


act-XCCX3CX 5 C-act" 


5 x 10 8 


171 




8 


A-Xs-DPAK 


1.5 x 10 9 


29 



3 Foreign sequences are set off by hyphens, nr = not reported. X - any amino acid. b Underlined residues represent the thrombin 
receptor tether region, 53 epitope for monoclonal antibody used for separation of phage resistant to proteases, 88 tandemistat scaffold, 116 
cytochrome b 5 62 scaffold, 11 ' 1 minibody scaffold, 99 factor Xa protease cleavage site, 57 and a-conotoxin scaffold. 171 



synthetic "mimotope" strategy of Geysen and his co- 
workers;' 11,42 such libraries are listed in Table 1. In 
this case, the DNA inserts are derived from "degen- 
erate" oligonucleotides, which are synthesized chemi- 
cally by adding mixtures of nucleotides (rather than 
single nucleotides) to a growing nucleotide chain. In 
the degenerate sequence NNKNNKNNKNNK, for 
example, each N is an equal mixture of A, G, C, and 



T; each K is an equal mixture of G and T; each NNK 
is a mixture of 32 triplets that include codons for all 
20 natural amino acids; and the entire 12-base 
sequence is an equimolar mixture of over a million 
(32 4 ) different molecular species collectively encoding 
all 160 000 (20 4 ) possible 4-residue peptides. Degen- 
eracy at the level of whole codons, rather than single 
nucleotides, can give a less biased representation of 
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Table 2. Proteins Displayed on Filamentous Phages 



Smith and Petrenko 



protein 


type 


refs 


genomic libraries 






DNA Staphylococcus aureus 


3+3, 8+8 


172,173 


cDNA libraries 






Aspergillus fumigatus 


3+3 a 


174,175 


Ancylostoma caninum 


6 


30 


fragments of proteins 






/?-galactosidase 


3 


25 


bluetongue virus capsid protein VP5 


3 


136 


plasminogen-activator inhibitor 1 


3+3 


176 


RNA polymerase II 


3+3 


132 


enzymes 






alkaline phosphatase 


3 


177 




8+8 


178 




3+3 a 


31 




3+3 


179 


trypsin 


3+3, 8+8, 33, 88 


90,180 


prostate specific antigen 


3+3 


181 


/^-lactamase 


3 


85, 182-184 


cytochrome &62 






glutathione transferase 


3+3 


185 


staphylococcal nuclease 


3 


186 




3+3 


187 


lysozyme 


3+3 


179 


hormones 






human growth hormone 


3+3 


112, 188 


atrial natriuretic peptide 


3+3 


189 


3+3 


190 


angiotensin 


33, 3+3 


32 


endothelin 


33 


116 


inhibitors 






bovine pancreatic trypsin inhibitor 


88 


130 




3 
3 


58 
96 


plasminogen activator inhibitor 


3+3 


191,192 


Kunitz domain of Alzheimer's amyloid /3-protein precursor 


3+3 


64,98 


cystatln 


3+3 


193 


ecotin 


3+3 


89,90 


Tendamistat 


33 


116 


lipoprotein-associated coagulation inhibitor 


3 


68 


toxins 






\spi i 'i , ■ fumigatus ribotoxln 


3+3 a 


31 








receptors 






IgE receptor (a subunit) 


3+3 


195 


protein A (B domain) 


3+3 


196 


IgG binding domain from group G Streptococcus 


3 


117,197,198 


T cell receptor 


8+8 


199 


CD4 domains 1 and 2 


88 


200 


ligands 

Ligands for Src homology 3 domain 


nr* 


137 
100 


Urokinase plasminogen activator fragment 13-32 


3 


51 


thrombin receptor activating peptide 


3 


53 


substance P 


3 


53 


neurokinin A 


3 


53 


neurokinin B 


3 


53 


epitopes and antigens 






epitope of malaria parasite Plasmodium falciparum 


8 


28 




8+8 


126 




3 


25 


chlamidial epitope 


88 


107 


DNA and RNA binding proteins 






zinc fingers 


3 


153 




3+3 


154, 156 




33 


155, 157-159 


U1A protein 


3+3 


201 


enzyme substrates 






protease substrates 


3+3 


86, 87 








interleukin 3 


3 


202 


ciliary neurotrophic factor 


3+3 


43, 203 


interleukin-6 


3+3 


204 


cDNA libraries 






Aspergillus fumigatus 


3+3 a 


174,175 


Ancylostoma caninum 


6+6 


30 



Table 2 (Continued) 



protein type refs 



fragments of proteins 






/3-galactosidase 


3 


25 


VP5 of bluetongue virus 


3 


136 


plasminogen-activator inhibitor 1 


3+3 


176 


RNA polymerase II 


3+3 


132 


" Displayed through a leucine zipper fastener (section II.B). 


b nr = not reported. 





the amino acids in the random peptides. 43-45 A 
typical random peptide library has about a billion 
phage clones— enough to represent most of the 64 
million possible 6-mers, but far too small to represent 
the 3 x 10 19 possible 15-mers. 

Table 2 lists constructs that display all or part of 
natural peptide or protein domains, rather than 
random peptides. In "genomic" libraries, the inserts 
are fragments of total chromosomal DNA; thus, all 
coding sequences in the organism's genetic comple- 
ment (i.e., "genome") are potentially represented 
among the displayed peptides. Similarly, in cDNA 
libraries the inserts are DNA copies of messenger 
RNAs (mRNAs) extracted from some tissue or cell 
population; again, a huge diversity of coding se- 
quences is potentially represented in a sufficiently 
large cDNA library. In the remaining constructs in 
Table 2, the phages display all or part of a specific 
peptide or protein domain. In many cases, some 
positions in the displayed domain are "randomized" 
in some way to create a library of sequence variants, 
usually with an eye to selecting rare clones with 
enhanced function, or clones in which the displayed 
domain has acquired a new function as a result of 
mutation. 

IV. Selection 

A. General Principles 

Selection consists of culling an initial population 
of phage-borne peptides to give a subpopulation with 
increased "fitness" according to some user-defined 
criterion. In most cases, the input to the first round 
of selection is a very large initial library (10 9 clones, 
each represented by 100 particles on average, are 
typical numbers) and the selected subpopulation is 
a tiny fraction of the initial population (10 6 particles, 
say), fitter clones being overrepresented. This popu- 
lation can be "amplified" by infecting fresh bacterial 
host cells, so that each individual phage in the 
subpopulation is represented by millions of copies in 
the amplified stock. The amplified population can 
then be subjected to further rounds of selection 
(perhaps accompanied by mutagenesis) to obtain an 
ever-fitter subset of the starting peptides. 

There are two pivotal parameters of selection, 
which can often be manipulated to some extent in 
order to enhance the efficacy of selection. Stringency 
is the degree to which peptides with higher fitness 
are favored over peptides with lower fitness; yield is 
the fraction of particles with a given fitness that 
survive selection. The ultimate goal of selection is 
usually to isolate peptides with high fitness, but this 
does not mean that stringency should be increased 
without bound, since increasing stringency usually 



entails decreased yield. High yield of the fittest 
clones is of paramount importance in the very first 
round of selection, whose input consists of all clones 
in a very large initial library. Using the typical 
numbers in the previous paragraph, suppose that 
each clone in the library—including the very fittest 
clones that are the desired end-product of selection— is 
represented by only 100 particles on average. If the 
yield for the fittest clones is not greater than 1%, such 
clones have a good chance of being lost and of course 
can never be recovered. Those clones that do survive 
the first round of selection are amplified and are thus 
represented by millions of phages each in subsequent 
rounds; yield can then safely be decreased in favor 
of high stringency. There is a limit to stringency, 
however, because in practice selection techniques are 
imperfect and there is an unavoidable background 
yield of all phages regardless of their fitness. If 
stringency is set too high, the yield of a specifically 
selected phage will fall far below the background of 
a nonspecifically isolated phage, and all power of 
discrimination in favor of high fitness is lost. 

B. Affinity Selection 

As mentioned already in section II.A, by far the 
most common selection pressure imposed on phage- 
displayed peptide populations is affinity for a target 
receptor. Affinity selection is ordinarily accomplished 
by minor modifications of standard affinity purifica- 
tion techniques in common use in biochemistry. 
Thus the receptor is tethered to a solid support, and 
the phage mixture is passed over the immobilized 
receptor. Those phages— usually a tiny minority— 
whose displayed peptides bind the receptor are 
captured on the surface or matrix, allowing unbound 
phages to be washed away. Finally, the bound 
phages are eluted in a solution that loosens receptor- 
peptide bonds, yielding an "eluate" population of 
phages that is greatly enriched (often a million fold 
or more) for receptor-binding clones. The eluted 
phages are still infective and are propagated simply 
by infecting fresh bacterial host cells, yielding an 
"amplified" eluate that can serve as input to another 
round of affinity selection. Phage clones from the 
final eluate (typically after 2-3 rounds of selection) 
are propagated and characterized individually. The 
amino acid sequences of the peptides responsible for 
binding the target receptor are determined simply 
by ascertaining the corresponding coding sequence 
in the viral DNA. In general, high stringency is 
favored by low densities of the target receptor 46 and 
by monovalent display of the foreign peptide; 47 high 
stringency is almost invariably accompanied by rela- 
tively low yield. In the remainder of this subsection, 
we will review ways in which these general principles 
have been implemented. 



398 Chemical Reviews, 1997, Vol. 97, No. 2 



Smith and Petrenko 



The solid supports to which target receptors are 
tethered can be classified into surface supports- 
polystyrene dishes, 24 impermeable plastic beads, 47 
nitrocellulose membranes, 48 and paramagnetic 
beads 49 — and permeable beaded agarose gels. 50 Per- 
meable agarose beads are convenient to use and have 
a very high capacity per unit volume. However, it 
seems unlikely that phage particles, whose long 
dimension (~1 /<m) is orders of magnitude larger than 
the average diameter of the pores of an agarose gel, 
can diffuse far into the interior of a bead; for this 
reason, only receptors tethered at the very surface 
of a bed may actually be effective at capturing phage. 

Receptors can be directly attached to the solid 
support by chemical coupling 47 ' 50 or noncovalent 
adsorption to a hydrophobic plastic surface. 24 Alter- 
natively, receptor molecules can be biotinylated and 
allowed to bind to a surface that has already been 
coated with avidin or streptavidin, thereby attaching 
them indirectly through the superstrong biotin— 
avidin or biotin— streptavidin bond. 25 There are 
numerous other ways of indirectly attaching the 
receptor to the solid support; indeed whole cells in 
suspension or attached to a culture dish can be used 
as a solid support to select ligands for cell-surface 
receptors. 51 " 54 

Indirect attachment via a biotin moiety allows a 
"two-step" mode of capture: 25 in the first step, the 
phage mixture is reacted with biotinylated receptor 
in homogeneous solution; in the second, the mixture 
is reacted with streptavidin-coated solid support in 
order to capture those phages whose displayed pep- 
tide bound the biotinylated receptor during the first 
step. In principle, at least, two-step capture allows 
the kinetics of the binding reaction to be controlled 
without the complications attendant on surface reac- 
tions. 

After the capture step (whether part of a one- or 
two-step procedure), the solid support is washed to 
remove unbound phages and eluted under conditions 
that release the bound phages without impairing 
their infectivity. Nonspecific elution conditions are 
intended to weaken receptor— peptide interactions 
without regard to their specificity. They exploit the 
high resistance of filamentous phage to denaturation 
by acidic buffers with pH's down to 2. 2, 24 alkaline 
buffers such as 0.1 M triethylamine, 55 urea concen- 
trations as high as 6 M at pH 2.2 (G.P.S., unpub- 
lished), and proteases such as trypsin 56 and factor 
Xa. 57 Gradients of acidity 58 or other agents 59 have 
been used in an attempt to elute phages in order of 
increasing affinity. This approach should be used 
with caution, however, since in most cases it is not 
clear a priori how closely the affinities of receptor- 
peptide bonds correlate with their resistance to 
denaturing conditions. 

Specific elution seeks to release phages that are 
bound to the target receptor's binding site, without 
releasing phages that are bound for some other 
reason— for example, by interaction with a contami- 
nant, or with the carrier protein that is often used 
to block nonspecific adsorption sites on the solid 
support after the target receptor itself has been 
immobilized. In competitive elution, a known soluble 
ligand for the receptor competes with phage for 



binding to immoblilized receptor. 60-64 This is a two- 
stage process: the phage-borne peptides must first 
dissociate spontaneously from the solid support, then 
the competitor binds the receptor binding site thus 
freed, reducing its availability for rebinding phage- 
borne peptide. Thus if the time course of dissociation 
is long on the scale of the experiment, competitive 
elution will fail. Noncompetitive elution, in contrast, 
relies on a compound that specifically loosens binding 
by the receptor without binding to its binding site, 
and without weakening binding interactions in gen- 
eral. For instance, phage bound to a calcium-de- 
pendent receptor can be eluted with the calcium 
chelator EGTA; 65 ' 66 this greatly increases the speci- 
ficity of elution, since only rarely would a nonspe- 
cifically bound phage happen to be held in a calcium- 
dependent fashion. 

It is actually not necessary to elute the captured 
phage at all. Simply adding fresh bacterial host cells 
to the solid support allows the captured phage to 
infect cells and thus be propagated. 67 The yield is 
generally low (1 — 10% of the yield from elution; 
G.P.S., unpublished), but in all but the first round 
of selection is probably sufficient to ensure retention 
of binding clones (see above). So far, this "elution 
by infection" has been reported only for peptides 
displayed on pVIII; it is not clear how well it will 
work if the peptide is displayed on pill. 

In some cases, the unamplified eluate is directly 
subjected to another round of selection. 68 Because 
the yields of even the highest-affinity clones at each 
round of selection seldom approach 100%, however, 
overall yields decline sharply with successive rounds; 
so it is important to start with an initial population 
in which each clone is represented by sufficient 
numbers of particles to guard against extinction if it 
happens to be a good binder. Also, some elution 
conditions seem to somehow physically alter the 
phages (without impairing infectivity) so that the 
background yield in the next round is much higher 
than with amplified phages. 25 

Several groups 69-71 have introduced a promising 
variant of affinity selection that does not rely on 
physical capture on a solid support. Here, the 
peptide is displayed on a mutant version of coat 
protein pill that is missing its N-terminal domain, 
as in the third line of Figure 1. Since this domain is 
required for infectivity, these particles are noninfec- 
tive. Infectivity can be restored by attaching the 
missing N-terminal domain to a receptor that binds 
the phage-borne peptide. Therefore, only phage 
displaying peptides that bind the receptor are infec- 
tive and are thus amplified. This sets up a sort of 
"automatic" evolution in which an initially highly 
diverse population of peptides evolves toward higher 
affinity for the receptor as the phage grow in the 
presence of host bacteria. There are many variations 
on this theme. For instance, if each phage clone in 
the initial library encodes both a randomized peptide 
on the defective pill and a randomized receptor fused 
to the N-terminal domain, this system can be used 
to isolate peptide-receptor pairs with affinity for 
each other. 71 

Because selected phage are in the end cloned and 
characterized one by one, it is feasible to use a 
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complex mixture of receptors, rather than a single 
receptor, to capture phage. For instance, total hu- 
man serum immunoglobulin, comprising hundreds or 
thousands of individual antibody specificities, can be 
used to select a diversity of peptides, each recognized 
by one of the specificities. This is the basis of "epitope 
discovery", 72 " 83 a strategy for identifying diagnostic 
peptides and synthetic vaccine components (section 
VIII.E). 

The progress of affinity selection through succeed- 
ing rounds is ordinarily reflected in increasing affin- 
ity of individual phage clones or of entire eluate 
populations for the target receptor. The affinity of 
individual clones or entire eluate populations can be 
assessed quantitatively by standard enzyme-linked 
immunosorbent assay (ELISA). 14 Alternatively, a 
few hundred individual clones from an eluate can be 
sampled on "plaque lifts" and tested qualitatively for 
ability to bind the receptor. 72 

C. Selection for Traits Other than Affinity 

In principle, at least, phage-borne peptides might 
be selected on the basis of fitness criteria other than 
affinity for a target receptor. Thus, for example, 
Petrenko and co-workers selected phage clones from 
a type 8 library that are resistant to extraction with 
chloroform. 29 In practice, however, almost all selec- 
tion procedures have involved affinity at least indi- 
rectly. Suicide inhibitors 84 are a case in point, as 
illustrated in experiments of Soumillion and col- 
leagues. 85 They incubated a library of phage display- 
ing variants of /3-lactamase with a special /3-lactam 
substrate coupled to biotin. This substrate is con- 
verted by /3-lactamase to a highly reactive form that 
couples itself to the /3-lactamase enzyme. Thus phage 
displaying catalytically active /3-lactamase molecules 
become marked with a biotin moiety and can be 
specifically captured out of a vast mixture of unmodi- 
fied phage by their high affinity for immobilized 
streptavidin. 

Affinity has similarly been used indirectly to select 
for protease substrates. 86 " 90 In these projects, a 
peptide or protein domain with high affinity for a 
convenient receptor is fused to pill coat protein 
through a randomized amino acid sequence. The 
phage library is bound to a solid support coated with 
the receptor and then exposed to the protease. Those 
phages whose randomized amino acid sequence hap- 
pens to be a substrate for the protease are released 
from the solid support and can be propagated by 
infecting fresh cells. By sequencing the randomized 
peptide's coding sequence within the viral DNA in 
these phage clones, amino acid sequences that are 
effective substrates for the protease can be ascer- 
tained. 

To select peptides that home to the brain, Pas- 
qualini and Ruoslahti 91 injected phage libraries into 
the tail vein of mice, recovered phages from brain or 
kidney a few minutes later, amplified the phages by 
infecting fresh bacterial host cells, and reinjected to 
initiate the next round of selection. Peptides capable 
of mediating selective localization of phage to brain 
and kidney blood vessels were identified in this way 
and showed up to 1 3-fold selectivity for these organs. 
It is likely that specific homing is based on affinity 
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Figure 3. Distributions of amino acids observed at six 
randomized positions in phages affinity-selected with ri- 
bonuclease S-protein. Amino acids classified as motif 
residues are labeled with letters; they all have frequencies 
of at least 13% at the indicated position, whereas nonmotif 
residues have frequencies no greater than 8%. A motif 
residue whose frequency is no greater than 25% of the 
frequency of the next most abundant amino acid at that 
position is classified as a minor motif residue and labeled 
with a lower-case letter enclosed in parentheses; the 
remaining motif residues are classified as major and are 
labeled with upper-case letters. 

for a saturable tissue receptor of some sort, since it 
could be blocked by simultaneous administration of 
the free peptide. 

D. Enrichment of Specific Sequence Motifs 

Increasing fitness is typically accompanied by 
emergence of a common "motif in the amino acid 
sequences of the selected peptides (sometimes more 
than one motif). For instance, Smith and co-work- 
ers 92 (also D. A. Schultz, J. E. Ladbury, G.P.S. and 
R. O. Fox, unpublished) used ribonuclease S-protein 
to affinity-select peptides from a library displaying 
50 million random hexapeptides. The incidence of 
the amino acids at each of the six randomized 
positions after three rounds of selection is graphed 
in Figure 3. It is evident that the sequence FNFE 
greatly predominates at positions 1—4 and that just 
a few chemically similar amino acids dominate at 
positions 5 and 6 as well (V/I and V/I/L/M, respec- 
tively). Twelve of the 20 natural amino acids did not 
appear at any position in any of the clones, though 
they were present at roughly the expected frequen- 
cies in the initial library. Overall, the 6-mer motif 
(F/y)NF(E/v) (V/I) (V/I/L/M) is evident, where lower- 
case letters indicate minor motif residues (defined in 
the legend) and amino acids enclosed in parentheses 
are alternatives observed at a given position. 

It is noteworthy that this motif does not resemble 
any part of the amino acid sequence of S-protein's 
natural ligand, S-peptide; there is no way it could 
have been predicted by rational design, despite 35 
years of intensive work on the S-protein/S-peptide 
system. Similarly, Wrighton and colleagues 93 iso- 
lated a small peptide that is a full agonist of the much 
larger erythropoietin hormone, but shares no signifi- 
cant similarity with it at the amino acid sequence 
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level. In general, emergence of entirely unexpected 
motifs is a recurring theme in the results of selections 
from random peptide libraries and testimony to the 
power of selection to reveal bioactive structures that 
could not be discovered by rational design. 

V. Exploring the Fitness Landscape 

A. Sequence Space, Fitness Landscapes, and 
Sparse Libraries 

The ensemble of all possible combinations of amino 
acids at randomized positions in a library (e.g., of all 
64 million possible hexapeptides for a random 6-mer 
library) comprise an abstract geometric domain that 
is commonly called "sequence space"; each individual 
sequence is thus a point in this sequence space. In 
this section, we will represent sequence space as a 
two-dimensional plane and individual sequences as 
points on that plane. We must not take the analogy 
to a map too literally, however: there is no clear 
relationship between the distance separating two 
sequences in any geometric representation— even a 
highly abstract one with multiple dimensions— and 
the resemblance of those sequences' physical and 
chemical properties. Still, it will be useful in what 
follows to suppress this complication and consider 
points that are close in our two-dimensional repre- 
sentation to represent similar amino acid sequences. 
More specifically, the two axes of the plane could 
represent the possible sequences in nonoverlapping 
subsets of the randomized positions which we will 
call regions 1 and 2; region 1 could be positions 1-4 
of a random octapeptide, for example, and region 2 
positions 5-8. 

Imagine, then, adding one more dimension— a third 
axis in our grossly simplified but heuristically useful 
representation. On this axis we plot the fitness of 
each of the sequences in sequence space according 
to the selection criterion being imposed on the peptide 
population. These closely spaced points form a 
surface overlying sequence space. Parts A and B 
Figure 4 illustrate two hypothetical "fitness land- 
scapes" overlying two-dimensional sequence space. 
The researcher's goal, in these terms, can be under- 
stood as searching through sequence space in order 
to find the highest point he or she can on the fitness 
landscape— the sequence that is fittest by the artifi- 
cially imposed selection criterion. In no practical case 
are the staggering amounts of data required to plot 
an actual empirical fitness landscape available; nev- 
ertheless, thinking about possible fitness landscapes 
is a mental device that can help in devising more 
efficient search strategies. 

The fitness landscapes depicted in Figure 4A,B 
differ in two important ways. First, the peak in 
Figure 4A is broad and smooth, whereas Figure 4B 
has multiple peaks, including a sharp one, that give 
it a more "rugged" character. Second, the landscape 
in Figure 4A has a special kind of symmetry. If we 
choose an arbitrary point in region 2, and keeping 
that sequence fixed plot fitness over the one-dimen- 
sional sequence space of region 1 , we obtain a curve 
(Figure 4C) that we will call a "transect" through the 
fixed point in region 2. This transect has the same 
relative shape (though very different absolute heights), 



regardless of which region 2 sequence it passes 
through. Similarly, all perpendicular transects 
through fixed points in region 1 have the same shape 
(in this case, a broader shape than in Figure 4C). This 
symmetry implies that the fitness at each point in 
sequence space can be written as the product of a 
fitness contributed by region 1 and a fitness contrib- 
uted by region 2. Equivalently, taking the logarithm 
of both sides of this equation, we arrive at an 
alternative definition of fitness to which regions 1 and 

2 make additive contributions. If, to give a concrete 
example, fitness consists of affinity for a target 
receptor, the symmetry illustrated in Figure 4A 
means that overall affinity is the product of affinities 
contributed by regions 1 and 2 separately. And since 
the free energy of binding is proportional to the 
logarithm of affinity, this also means that the overall 
free energy of binding is the sum of free energies of 
binding contributed separately by regions 1 and 2. 
When such symmetry is present, we say that regions 
1 and 2 make "independent" or "additive" contribu- 
tions to fitness, and correspondingly we will call the 
fitness landscape itself additive. It is obvious that 
the landscape in Figure 4B is far from additive: a 
transect through a point in region 2 that lies close to 
the broad peak (transect A in Figure 4D) is com- 
pletely different from a transect lying closer to the 
sharp peak (transect B in Figure 4D). Whether or 
not a fitness landscape is additive in this sense 
depends on how the random positions are parsed into 
regions. For instance, if the random residues occur 
in two separate loops in a protein domain, the 
sequences in the two loops might well make inde- 
pendent contributions to fitness, whereas a subdivi- 
sion that groups residues from the two loops into a 
single region might result in a markedly nonadditive 
landscape. 

A phage-display library can be seen as a collection 
of random points in sequence space. Only for the 
tiniest sequence spaces do these points represent all 
or most possible sequences. For example, there are 

3 x 10 19 possible random 15-mers, whereas the 
largest phage-display libraries comprise only about 
10 10 individual clones— an exceedingly sparse sam- 
pling on the scale of the relevant sequence space. 

In principle, at least, selection can identify the 
fittest clone(s) in the library— the peptide(s) corre- 
sponding to the highest point(s) on the fitness land- 
scape; we will call such a clone an "initial champion." 
Because of the sparseness of the library, however, 
an initial champion's fitness may be far inferior to 
that of the globally fittest clone(s) in sequence space. 
This is especially likely if maximum fitness lies atop 
a sharp peak in a rugged landscape (Figure 4B)— a 
narrow topographical feature that may be missed 
altogether in a sparse sampling. The more clones in 
the library, and the less biased the representation 
of the different amino acids, 43 " 45 the less severe this 
deficit is likely to be, underscoring the desirability 
of large libraries. Once constructed, a large, general- 
purpose library is an extremely valuable resource, 
since it can be replicated indefinitely by infecting 
fresh bacterial host cells and widely distributed for 
use in an unlimited number of projects. 
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Figure 4. Fitness landscapes: (A, top left) a nonrugged, additive landscape; (B, top right) a rugged, nonadditive landscape; 
(C, bottom left) a transect of the additive landscape in A through any sequence in region 2 (all transects have identical 
shapes); (D, bottom right) two transects of the nonadditive landscape in B, passing through two sequences in region 2 
(different transects can have markedly different shapes). 

B. Strategies for Exploring the Fitness 
Landscape 

A researcher who is not content with the fitness of 
the initial champion obtained from one library might 
of course try new libraries. But that is a woefully 
inefficient way of accessing additional points in 
sequence space, even leaving out of consideration the 
fact that construction of a new library is an arduous 
affair. Natural evolution suggests a much better 
strategy: introduce random mutations into a popula- 
tion that has already been subjected to selection, then 
select again from the resulting mutagenized popula- 
tion to obtain even fitter clones. In effect, the search 
through sequence space is concentrated in the close 
neighborhoods of clones that, having survived a 
previous round of selection, are enriched for se- 
quences with at least some level of fitness. 

The first round of selection favors clones at higher 
elevations in the fitness landscape and disfavors 



clones at lower elevations. The survivors are then 
propagated and mutagenized (mutagenesis methods 
will be discussed at the end of this section), each thus 
giving rise to a "clan" of variant descendants. As- 
suming mutagenesis is moderate, these descendants 
will lie close to their progenitor in sequence space- 
some at higher elevation, some at lower. In the next 
round of selection, usually at higher stringency, fitter 
variants are again selected over less-fit ones. As the 
population experiences each successive cycle of mu- 
tagenesis and ever more stringent selection, more 
and more clans become extinct, while the remaining 
ones "climb" toward the tops of their local fitness 
peaks. 

The "greedy" strategy is one implementation of this 
program. First, the initial library is subjected to 
multiple rounds of increasingly stringent selection in 
the hope of selecting the very best clone— the initial 
champion. 43 ' 94 This single clone is then mutagenized 



402 Chemical Reviews, 1997, Vol. 97, No. 2 



Smith and Petrenko 



to generate a single clan of variants, from which yet 
fitter clones are selected. And so on, until further 
rounds of stringent selection and mutagenesis yield 
no further improvement in fitness— i.e., until the 
summit of the local fitness peak is achieved. 

The greedy strategy is inherently risky, however, 
because the initial champion may lie on a local fitness 
peak of much lower elevation than the globally 
highest peak. Consider, for instance, a rugged fitness 
landscape like that in Figure 4B, where the highest 
peak has a very small footprint in sequence space and 
thus will be sampled very infrequently in the initial 
library, while a much lower peak has a much larger 
footprint and thus will be sampled frequently. It is 
highly plausible, in such a case, that the initial 
champion will happen to lie on a broad, low feature 
and that far superior sequences lie at completely 
different locations in sequence space— locations that 
are too far distant from the initial champion to be 
accessible by mutagenesis. Of course, the researcher 
will not know— certainly not in advance— that the 
fitness landscape is like this, but it may be prudent 
to design a search strategy that has a chance of 
overcoming this limitation. An attractive "non- 
greedy" approach is to relax the stringency of selec- 
tion in the first cycle, so that not only the initial 
champion but also many clones of inferior (but still 
above-average) fitness survive. 95 This larger sub- 
population has a chance of including "dark horses": 
clones that are inferior to the initial champion, but 
that lie near higher fitness peaks. This entire 
subpopulation is then mutagenized en masse to 
generate a new library, which is subject to the next 
round of selection. This scenario resembles natural 
evolution much more closely than does the greedy 
search. 

Even if there is a dark horse in the initial library, 
there is no guarantee that its descendants will 
ultimately win the competition, since the descendants 
of the initial champion start with a competitive 
advantage; nor can we point to any case where a dark 
horse has been actually demonstrated empircally. 
Furthermore, there is a disadvantage to the non- 
greedy approach: because limited search resources 
must be distributed among many neighborhoods in 
sequence space (one for each clone in the selected 
subpopulation), it is not possible to search the 
neighborhood of the initial champion nearly as 
thoroughly as in the greedy method. Perhaps, if the 
stakes are high enough, both approaches are worth 
trying. 

The "stepwise" or "iterative" strategy is another 
approach to searching sequence space. 64 ' 96-100 Here, 
the randomized positions are divided into two or more 
subsets or regions, like regions 1 and 2 in Figure 
4A,B. The sequence in one of the regions— region 2, 
say— is fixed, while the other region (region 1) is 
randomized; in terms introduced in the previous 
subsection, this is equivalent to thoroughly exploring 
a single transect of sequence space passing through 
a single point in region 2 (Figure 4C,D) . The possible 
sequences in region 1 will be much more thoroughly 
represented in this restricted library than in a library 
in which both regions 1 and 2 are randomized 
simultaneously. The fittest clone in the transect is 



then selected from the restricted library, to identify 
an optimal region 1 sequence. The process is then 
reiterated, but this time the region 1 sequence is fixed 
at its optimum, region 2 is thoroughly randomized, 
and again the fittest clone is selected. In theory, the 
optimum region 1 and region 2 sequences identified 
in this two-step process should together constitute 
the overall optimum sequence. This supposition is 
justified for an additive fitness landscape like Figure 
4A, since every transect yields the same optimum 
(Figure 4C); but for nonadditive landscapes like 
Figure 4B, the results can be very different, depend- 
ing on what happens to be chosen as the fixed 
sequence in region 2 (Figure 4D) . A stepwise search 
makes sense when the randomized positions can be 
subdivided into well-defined, separate parts of a 
protein domain. 

Two main methods have been used to introduce 
mutations into selected clones. Error-prone poly- 
merase chain reaction (PCR) is used when many 
clones must be simultaneously mutagenized en masse, 
as in the nongreedy strategy. 101-103 In contrast, when 
random mutations must be introduced into a confined 
segment of a single clone, as is typically the case in 
the greedy and step-wise strategies, incorporation of 
degenerate oligonucleotides (as in the contruction of 
random peptide libraries; section II.C) is an efficient 
method in which the frequency and uniformity of 
mutations can be easily controlled. 

VI. Effect of Conformational Constraints 

Unlike natural proteins or protein domains, ran- 
dom peptides do not generally fold into a well-defined 
three-dimensional structure. However, constraints 
can be artificially imposed on the peptide in order to 
greatly reduce the range of conformations available 
to it. In general, a library of constrained peptides 
will represent far fewer three-dimensional shapes 
than a library of unconstrained (but otherwise com- 
parable) peptides. As a consequence, the probability 
that a clone will posess the target activity— affinity 
for a receptor, for instance— is correspondingly re- 
duced. 74104 On the other hand, a constrained peptide 
whose accessible conformations happen to overlap 
extensively with active conformations may possess 
far higher activity than any unconstrained peptide. 

The most common constraint on displayed peptides 
is a disulfide bond between two half-cystine residues 
at fixed positions in an otherwise random sequence; 
many such constructs are listed in Table 1. Because 
the phage coat proteins are secreted into the oxidizing 
milieu of the periplasm and ultimately secreted into 
extracellular medium with abundant dissolved oxy- 
gen, cysteine residues within a single displayed 
peptide can be expected to form intrapeptide disul- 
fides in at least a portion of the displayed peptides; 
interchain disulfides are much less likely, since the 
distance between neighboring coat-protein subunits 
is at least 10 times longer than a disulfide bond. In 
several cases, the disulfide bond has been shown to 
be required for the ability of the displayed peptide 
to bind a target receptor. 32 ' 61 ' 93 ' 105-110 In general, the 
closer the half-cystines, the tighter the constraint 
imposed on the amino acids lying between them. 
Thus, disulfides spanning different numbers of amino 
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acid positions would be expected to impose very 
different, mutually exclusive conformational con- 
straints when the numbers are small. In contrast, 
disulfides spanning more than about six residues 
probably impose relatively weak constraints that are 
compatible with a great diversity of conformations. 

Coordination bonds between histidine residues and 
metal ions can constrain peptides in much the same 
way as disulfide bonds. De Ciechi and co-workers, 
for instance, reported that a monoclonal antibody 
affinity-selected peptides with the motif HXG(A/T)- 
XH and that binding was abolished in the presence 
of the metal-chelating agents EDTA and EGTA. 111 

A second way of constraining peptides is to present 
them in the context of a protein scaffold. In this case, 
random peptides can be presented not only as loop 
structures, but also as parts of a helices, /? sheets, 
turns, and other elements of secondary structure. 
Table I includes many such libraries; the host scaf- 
folds include human growth hormone, 112 bovine pan- 
creatic trypsin inhibitor, 96 antibodies, 113 minibodes 
(next paragraph), cytochrome 6s 6 2. 114 zinc fingers, 115 
Tandemistat, 116 Kunitz domain, 64 ' 98 and the IgG- 
binding domain of streptococcal protein A. 117 ' 118 With 
the exception of the minibody, zinc-finger, and Tan- 
demistat, most of these scaffolds have displayed a 
single randomized loop or region with the aim to 
optimize the binding of expressed modified protein 
with its natural receptor. Here we will consider the 
applications of protein scaffolds for construction of 
universal constrained peptide libraries. 

Minibodies have been particularly thoroughly in- 
vestigated as a host scaffold. 99119-122 A minibody is 
a 61 -residue peptide comprising three strands from 
each of the two /3-sheets of an immunoglobulin 
variable region domain, along with the exposed HI 
and H2 hypervariable regions. Such a minibody was 
displayed on the surface of the fl bacteriophage and 
the two hypervariable loops randomized to create a 
constrained peptide library; from this library clones 
were affinity-selected for high affinity to human IL- 
6. 

Bianchi and his colleagues constructed a confor- 
mationally homogeneous peptide library by random- 
izing five positions in the a-helical portion of a zinc- 
finger motif displayed on pVIII. 115 A monoclonal 
antibody specific for the lipopolysaccharide of the 
human pathogen Shigella Flexneri was used to affin- 
ity-select clones from the library, yielding a consensus 
motif with strong, zinc-dependent affinity for the 
antibody. Moreover, affinity for the antibody was 
retained when the same five side chains were trans- 
ferred to a synthetic scaffold that holds them in an 
a-helical geometry. This ability to transfer a motif 
to a peptidomimetic scaffold has great potential 
importance for use of phage display for drug discov- 
ery, since peptides themselves are not considered 
auspicious starting points for therapeutics (section 
VII.D). 

VII. Applications 

A. Target Receptors Used in Affinity Selection 

In Table 3 we list target receptors that have been 
used to affinity-select peptides from phage-display 



libraries. It is evident that the diversity of targets 
is very wide, encompassing not only conventional 
receptors like antibodies and hormone receptors but 
also (for instance) plastic surface 123 and whole organs 
in a living mouse. 91 Although most of the receptors 
recognize natural ligands that are proteins, some of 
them recognize nonproteinaceous ligands like carbo- 
hydrates, and some (e.g., plastic surface) have no 
natural ligand at all. In the subsections that follow, 
we will discuss a few of the major applications that 
these selection experiments have in mind. 

B. Epitope Mapping and Mimicking 

An "epitope" is the small determinant on the 
surface of a ligand with which the receptor makes 
close, geometrically and chemically specific contact. 
If the ligand is a protein, the epitope is sometimes 
"continuous," comprising a few adjacent critical amino 
acids in the primary sequence. For instance, anti- 
bodies specific for continuous epitopes on protein 
antigens typically contact three to four critical amino 
acids over a six-residue segment. More often, how- 
ever, protein epitopes are more complex. Many are 
"discontinuous" because they comprise critical bind- 
ing residues that are distant in the primary sequence 
but close in the folded native conformation. And 
many epitopes, including discontinuous ones, are 
"conformation-dependent" because they require the 
context of the overall protein structure to contrain 
them in a binding conformation. 

In many research contexts, it is highly desirable 
to "map" the epitope to a confined portion of the 
natural protein ligand. If the epitope is (or might 
be) continuous and not conformation dependent, 
random peptide libraries provide a cheap, easy ap- 
proach to this g O al. 13 . 80 i 05 . 124 - 134 The receptor is used 
to affinity select random peptide ligands, and the 
sequence motif in the selected peptides is compared 
to the amino acid sequence of the natural ligand. 
Often, in these cases, the motif clearly matches 
critical binding amino acids in the natural protein 
ligand, thereby mapping the epitope to a very narrow 
part of the overall natural ligand structure. Since 
this approach uses replicable, widely available, all- 
purpose random peptide libraries and simple micro- 
biological procedures, it is generally much cheaper 
and easier than alternative epitope mapping methods 
that require chemical synthesis of short peptide 
segments of the ligand's amino acid sequence. 135 

Only rarely will a random peptide library contain 
a binding motif extending to more than about six 
amino acids or adequately represent conformation- 
dependent or discontinuous epitopes. Although re- 
ceptors recognizing such epitopes often select ligands 
from random peptide libraries, these artificial ligands 
seldom bear a recognizable similarity to any part of 
the natural protein ligand at the amino acid sequence 
level. An alternative approach in such circumstances 
is to construct a gene-specific library displaying 15 — 
100 amino acid segments of the natural amino acid 
sequence 132 ' 136 — long enough to occasionally include 
small elements of secondary structure from the 
native protein. Such libraries sometimes contain 
good ligands for receptors that fail to select ligands 
from random peptide libraries. Because it requires 
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fibroblast growth factor 
carbohydrate Lewis Y antigen 
acetylcholine receptor 
angiotensin II 

glycoprotein D of herpes simplex virus type I 

oncoprotein pl85 HER2 

keratin 

plasminogen activator inhibitor type-1 
bluetongue virus VP7 
FLAG octapeptide 
Na+/K+-ATPase /3-subunit 
hepatitis B virus surface antigen 

dengue virus 
dystrophin 

von Willebrand factor 
HCV core protein 
^-Endorphin 
acetylcholine receptor 
cytochrome b 
proenkephalin 
Cell surface antigen B7-1 
prostate-specific membrane antigen 
P',P' 1 -diadenosine 5'-tetraphosphate receptor 
36-mer peptide from viral hemagglutinin, cyclic peptides, 
Pseudomonas aeruginosa pilin, Boretella pertusis pilin, HIV gpl20. 
rabbit muscle, L-type calcium channels, worm muscle myohemerythrin, 
lysozyme, trisaccharide on the O-antigen of Salmonella paratyphi, 
tetrasaccharide on the O-antigen of Shigella flexneri 
polyclonal antibodies 
anti-biotin 

anti-human lymphotoxin 
anti-mouse IgG Fc 

from sera of rheumatoid arthritis patients 
anti-TNFa from rheumatoid arthritis sera of patients 
anti-hepatitis B virus envelope protein 
anti-lymphotoxin 
anti-synthetic peptide 

from synovial fluid of rheumatoid arthritis patients 

from sera of Hepatitis C virus-infected patients 
chimeric antibody 

ELAMl-mouse IgG2b 
nonantibody protein receptors 

streptavidin 



concanavalin A 
calmodulin 

tumor suppressor protein p53 
Src homology 3 (SH3) domains 



Scr homology 3 domain (D-stereomer) 
Urokinase receptor 
integrin Ilb/IIIa 
integrin as/Ji 

heat shock cognate protein Hsc70 
tissue factor Vila 

atrial natriuretic peptide A receptor 
BiP chaperone 



pIII/X 6 


133 


pIII/Xl2, X 20 


134 


pIII/Xio 


162 


PIII/X15 


147 


pIII/X 30 


127 


plll/Xe 


144 


plll/Xg, CXsC 


128 


plll/Xe' 


143 


plll/Xa, CX 6 C 


32 


PIII/X15 


148 


pVIII/Xg, CXgC 


149 


plll/Xe, Xi 2 , X 2 o 


124,205 


pIII/X 6 


110 


pIII/Xs 


206 


pIII/Xio 


168 


pIII/Xis 


106 


pIII/Xl5 


76 


pIII/X 3 o 


125 


plll/Xe 


207 


pIII/Xis 


208 


pIII/X, 5 


209 


pIII/X 3 o 


127 


pVIII/X 2 o 


45 


pIII/Xs 


210 


pIII/Xs, X 9 


211 


pIII/Xs, X 12 , X 20 


212 


pIII/Xio, X,8, CX 8 -1 2 C 


111 


pIII/X 37 , x« 


57 


plll/Xe 


146 


11 pVIII libraries 


171 



plll/Xe 

pIII/X 38 
pVIII/Xg 

pVIII/Xg 

pVIII/CXgC 

pIII/X 6 , X15 

pVIII/9-mer 

pVIII/X 9 

plll/Xe 

pVIII/Xg, CXgC 

pIII/Xis 

plll/Xe 

plll/Xe 

pIII/X 38 

pIII/CX 6 C 

pIII/CX 4 - 6 C 

plll/Xg 

pIII/Xis 

pIII/Xs, X 8 

pIIUXis 

plll/Xe X, 2 , X 20 

pIII/X 15 

pIII/X 6 

pIII/X 8 , X 22 , X 36 

pIII/Xio 

pIII/X,o 

plll/CXeC 
plll/Xe 
pIII/X 7 
plll/Xe, X15 
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receptors library refs 



nonantibody protein receptors 

fibronectin pIII/CX G C 63 

erythropoeitin receptor Various 93 

E-selectin pIII/X 8 , Xi 2 , CX 2 _ 6 C 94 

CD1-/32M complex pVIII/X 22 2 20 

stromelysin, matrilysin pIII/X 6 88 

tissue-type plasminogen activator plll/Xe 221 

Ca z+ binding protein S-lOOb pIII/Xis 66 

a-chymotrypsin pIII/X 6 222 

HIV- 1 nucleocapsid protein NCp7 pVIII/CX 9 C 67 

core antigen of hepatitis B virus pIII/X 6 48 

fibroblast growth factor receptor 1 X 26 5 4 

trypsin pIII/X 6 223 

nucleic acids 

single stranded DNA pIII/X 6 160 

matrix attachment region DNA plll/CXgC 59 
small organic ligands 

biotin pVIII/CX 9 C 224 

dioxin pVIII/X 8 29 
whole cells 

insect and mammalian cells expressing human urokinase pIII/Xis 51 

plasminogen activator receptor 

platelets pIII/X 15 52 

plll/Xe 53 

brain, kidney pIII/X 9 , CX5-7C, CX ]8 C, X20 91 

plastic pIII/X 3 6, X 22 123 



" nr = not reported. 



construction of a specific library for each new ligand 
gene, however, this approach is much more arduous 
than use of all-purpose random peptide libraries. 

As emphasized in section IV.D, affinity selection 
from random peptide libraries often reveals entirely 
unexpected ligands— ligands that do not match any 
linear epitope and that could not have been antici- 
pated from even extensive knowledge of the receptor 
and/or its natural ligand. This is especially so when 
the receptor's natural epitope is nonproteinaceous or 
is a discontinuous or conformation-dependent protein 
epitope (previous paragraph). Geysen and his col- 
leagues introduced the term "mimotope" to refer to 
small peptides that specifically bind a receptor's 
binding site (and in that sense mimic the epitope on 
the natural ligand) without matching the natural 
epitope at the amino acid sequence level; 41,12 the 
definition includes cases where the natural ligand is 
nonproteinaceous. Mimotopes are usually of little 
value in mapping natural epitopes, but may have 
other important uses, as will be illustrated in the 
next few sections. 

C. Identifying New Receptors and Natural 
Ligands 

A ligand for a receptor can be used as a "probe" to 
identify new receptors that bind the same ligand. 
Sparks and his colleagues 138 - 140 and others 100 ' 104 - 137 
used this approach to identify novel SH3 domains— a 
family of homologous, ~60-residue, protein-binding 
modules found in a great variety of signaling and 
cytosceletal proteins. In the first step, a number of 
cloned SH3 domains were used to affinity-select 
specific ligands from random peptide libraries. Then, 
in the second step, these peptides were used to probe 
a conventional cDNA expression library for proteins 
that bind the peptide. Eighteen SH3 domains were 



identified in this way, nine of which were previously 
unknown. Systematic studies like these serve to 
deepen understanding of cell biology by interconnect- 
ing signaling pathways not hitherto known to be 
related. 

In a few very favorable cases, identifying peptide 
ligands from a random peptide library may suffice 
to find the natural ligand for an "orphan receptor"— a 
receptor whose natural ligand is unknown. Thus, for 
example, Ivanenkov era/. 66 affinity-selected peptides 
that specifically bind the Ca 2+ -dependent binding 
protein S-lOOb. They shared a motif of eight amino 
acids, and analysis of sequence data banks identified 
a similar motif in the a-subunit of actin capping 
proteins. The interaction of these two proteins was 
subsequently shown to be biologically significant. 

D. Drug Discovery 

Many of the receptors used in affinity selection are 
targets of drug discovery programs, and the peptide 
ligands selected by them are therefore potential leads 
to new drugs. 53 ' 61 ' 64 ' 94 ' 98 ' 108 ' 109 ' 120 ' 122 ' 141 - 146 Such pep- 
tides might act as receptor agonists or antagonists 
(for example, of enzymes or hormone receptors) or 
otherwise modulate the receptor's biological effect. 

Affinity selection resembles in essence the tradi- 
tional approach to drug discovery: screening libraries 
of synthetic compounds or natural products for 
substances that bind the target receptor and that 
might therefore be leads to new agonists, antagonists, 
or modulators. There are important differences, 
however. Affinity selection has the key advantage 
that the scale of the search is many orders of 
magnitude greater than is feasible when chemical 
libraries must be screened compound by compound- 
billions of peptides versus tens of thousands of 
chemicals. On the other hand, for most pharmaceu- 



406 Chemical Reviews, 1997, Vol. 97, No. 2 



Smith and Petrenko 



tical applications, peptides have poor pharmacologi- 
cal properties, being generally orally unavailable and 
subject to rapid degradation in the body by naturally 
occurring enzymes. There is some precedent for 
synthesizing peptidomimetic compounds that mimic 
the essential pharmacological features of bioactive 
peptides on a nonpeptide scaffold (section VI). But 
developing peptidomimetics is an arduous and chancy 
project in medicinal chemistry, and it seems likely 
that the most important contribution of phage display 
to drug discovery will be confined to applications 
where peptides themselves can serve as plausible 
therapeutics. For example, Wrighton and colleagues 93 
used phage display to identify a small peptide agonist 
of the receptor for eythropoietin, a protein hormone 
that is administered parenterally in some circum- 
stances. The small peptide, which bears little re- 
semble to the natural hormone at the amino acid 
sequence level, might serve as a superior substitute 
for the much larger protein. Vaccines (next subsec- 
tion) are another case in which peptides are emi- 
nently usable therapeutics. 

Peptides composed of D-amino acids are much less 
susceptible to degradation in the body than peptides 
composed of the natural L-amino acids. Schumacher 
and his colleagues have put forth a clever (if expen- 
sive!) way of using phage display to identify D-amino 
acid peptide ligands for target receptors. 145 They 
synthesized chemically the D form of an SH3 domain 
and used it to affinity select ligands from a random 
peptide library, whose amino acids are of course the 
natural L isomers. The D forms of these peptides are 
therefore ligands for the natural L form of the 
receptor— the form that would be the actual target 
of drug discovery. 

E. Epitope Discovery— A New Route to Vaccines 
and Diagnostics 

When the receptor used for affinity selection is an 
antibody, the peptides it selects from random peptide 
libraries are called "antigenic mimics" of the corre- 
sponding natural epitope— the antigenic determinant 
that elicited the selector antibody in the first place. 
When these peptides are used in turn to immunize 
naive animals, some are able to elicit new antibodies 
that cross-react with the natural epitope, even though 
the naive animals have never been directly exposed 

t0 it 63,72,74,76.77,80.83.147-150 Such peptides are " im _ 

munogenic mimics" as well as antigenic mimics. 

By no means are all antigenic mimics immungenic 
mimics in this sense, 151 however, and undoubtedly 
many failures of immunogenic mimicry have gone 
unreported. There are at least two highly plausible 
scenarios according to which a peptide that binds its 
selector antibody (thus qualifying as an antigenic 
mimic) would not be able to elicit cross-reacting 
antibodies when used to immunize naive animals 
(thus failing as an immunogenic mimic). First, if it 
is flexible— and most small peptides are— it might 
adopt one conformation when it binds the selector 
antibody but myriad other conformations when it 
elicits new antibodies, few if any of which would 
therefore cross-react with the authentic epitope. 
Second, a peptide may be an antigenic mimic without 
being a true structural mimic. Such a peptide would 



bind the selector antibody in an entirely different way 
than does the original authentic epitope, via alto- 
gether different interactions. Just so, peptides with 
the motif -HPQ- bind the biotin-binding pocket of 
streptavidin differently than does biotin itself. 152 
Such a peptide would be expected to elicit new 
antibodies that fit it in an altogether different way 
than does the original selector antibody; only rarely 
and coincidentally would these antibodies cross-react 
with the authentic epitope. 

Antigenic and immunogenic mimicry are the basis 
of "epitope discovery", 72 " 83 a new approach to disease 
diagnosis and vaccine development. Most diseases— 
particularly infectious diseases— leave their imprint 
on the complex mixture of antibody specificities that 
comprises the total serum immunoglobulin popula- 
tion. Included in this population are disease-specific 
antibodies— some elicited directly by antigens on a 
pathogen, others possibly recognizing antigens that 
reflect the disease process more indirectly. When 
total serum antibody from a patient is used to affinity 
select clones from a random peptide library, there- 
fore, some of the selected ligands will correspond to 
disease-specific antibodies. Of course the patient's 
pool of antibodies will contain myriad non-disease- 
specific antibodies, too, so it may require extensive 
counterselection or screening with antibodies from 
control subjects (not suffering from the disease) to 
identify those peptides that correspond to authentic 
disease-related antibody specificities and that there- 
fore can be considered diagnostic for the disease. This 
is an eminently "portable" program of discovery, 
using the same procedure and the same "all-purpose" 
random peptide libraries regardless of the particular 
disease. Even in the most difficult cases, it nets a 
rich diversity of diagnostic peptides with far less work 
than is required to identify antigenic peptides by 
direct study of a pathogen's antigenic makeup. 

Peptides obtained through epitope discovery have 
at least two obvious uses. First, as antigenic mimics 
they serve as specific probes for antibodies that are 
diagnostic for the diseases, much as natural viral 
proteins serve in current tests for HIV. Their ad- 
vantages over natural antigens as diagnostic re- 
agents include that they are easier and cheaper to 
discover and manufacture, that they can focus on a 
few particularly diagnostic specificities and exclude 
potentially confusing signals from nondiagnostic 
determinants, and that they can be discovered and 
used even when the natural antigens associated with 
the disease are entirely unknown. 

The second possible use of peptides obtained through 
epitope discovery is as components of synthetic 
vaccines. Only antigenic mimics that are also im- 
munogenic mimics are useful in this regard, of 
course, since in order to be protective an antibody 
must react with a natural epitope on the actual 
pathogen. 

F. Selection of DNA-Binding Proteins 

Phage display may help molecular biologists realize 
a long-standing goal: to design proteins that specif- 
ically bind a given target DNA sequence. Rational 
design has poor prospects in this field, since there 
do not seem to be simple rules of comple- 
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mentarity— comparable to those governing base-pair- 
ing between complementary single-stranded nucleic 
acids— by which the sequence specificity of a DNA- 
binding protein can be predicted from the amino acids 
at criticial positions in its structure. A much more 
promising approach is to construct a library of 
randomized variants of a parent DNA binding do- 
main (e.g., one of the zinc-finger domains, a common 
DNA-binding motif in eukaryotic nuclei) displayed 
on a filamentous phage; randomization is concen- 
trated on positions that are thought to make sequence- 
specific contacts with the target DNA in the parent 
domain. From this library, clones that bind a new 
target DNA sequence, different from that recognized 
by the parent domain, are then affinity-selected. 153 " 153 
In an experiment analogous to epitope mapping 
(subsection B above), phage display has been used 
to map the DNA binding site of SATB1, a nuclear 
matrix protein that specifically binds the minor 
groove of a DNA sequence motif called MAR. Using 
an MAR DNA sequence as the immobilized receptor, 
Wang and colleagues 59 affinity-selected peptides from 
a random peptide library; the predominant peptide 
shared 50% sequence identity with a nine-residue 
segment of the SATB1 sequence— a segment that was 
subsequently shown on independent grounds to be 
critical for DNA recognition. Phage display has also 
been used to affinity-select a hexapeptide with some 
binding preference for the single-stranded hepta- 
deoxycytidilate (dC) 7 , although in this case no map- 
ping purpose was in view. 160 

G. Landscape Libraries as a Source of New 
Materials 

The surface landscape of a filamentous virion is a 
cylindirical array of thousands of repeating subunits 
composed of the exposed parts of the major coat 
protein pVIII; this exposed shell accounts for about 
half the weight of the particle. When a random 
peptide is displayed on every copy of this protein, it 
subtends a major fraction (20% or more) of the 
repeating unit and thus of the entire particle surface. 
Unless the random peptide is loosely tethered to the 
bulk of the major coat protein, it is forced to interact 
with residues in its immediate neighborhood, and 
may therefore be constrained in a definite three- 
dimensional conformation that differs markedly from 
the surface conformation of wild-type particles 35 and 
of clones displaying other random peptides. A large 
population of such clones can therefore be regarded 
as a library of "organic landscapes". 29 

The ensemble of a random peptide in a landscape 
library with its surrounding wild-type residues may 
have emergent properties that are lost when the 
peptide is excised from its context. Such peptides are 
analogous to the complementarity-determining re- 
gions of antibodies— oligopeptide loops that in the 
context of the intact protein make most of the specific 
contacts with antigen but as free peptides seldom 
have appreciable antigen-binding propensities. In 
most applications to date, such emergent properties 
inhere in a single peptide and its immediate neigh- 
borhood. Localizable emergent properties are present 
even when the foreign peptide is displayed on only 
an occasional pVIII molecule, as in type 88 and 8+8 



systems. Nevertheless, the high-density display in 
landscape phage may greatly enhance overall ef- 
fectiveness in some applications. For instance, if a 
single target receptor complex can bind two or more 
neighboring peptides on the phage surface, the 
overall effective affinity may be enhanced many 
orders of magnitude compared to monovalent bind- 
ing. 

Some emergent properties are not localizable to a 
single subunit but seem instead to be a global 
property of the entire surface landscape. Thus, for 
instance, phage clones that are highly resistant to 
chloroform were selected from a landscape library; 
their entire surface is composed of hybrid pVIII 
subunits displaying a peptide motif that confers 
resistance to the solvent. In contrast, mosaic phage 
coated with roughly equal numbers of such hybrid 
subunits and wild-type subunits showed almost no 
resistance, indicating that resistance is not an addi- 
tive property to which each hybrid subunit contrib- 
utes independently. 29 

Landscape phage might be looked on as a new kind 
of submicroscopic "fiber." Each phage clone is a type 
of fiber with unique surface properties. These fibers 
are not synthesized one by one with some use in 
mind. Instead, billions of fibers are constructed, 
propagated all at once in a single vessel, and portions 
of this enormous population are distributed to mul- 
tiple end-users with many different goals. Each user 
must devise a method of selecting from this popula- 
tion those fibers that might be suitable for his or her 
particular application— by affinity selection or what- 
ever other selection principle ingenuity can conjure 
up. 

Localizable or global emergent properties cannot 
be transferred from the virion surface to another 
medium; any application that depends on such prop- 
erties must therefore use phages themselves as the 
new material. This undoubtedly precludes some 
applications of phage "fibers": it is doubtful we will 
be wearing clothes made of them, for instance. Still, 
filamentous phages are essentially proteins manu- 
factured by a fermentation process and as such are 
potentially usable in any of the myriad of applications 
that might be contemplated for such proteins. 

H. Phage Display— Combinatorial Chemistry on 
the Cheap 

For drug discovery and a handful of other high- 
profile applications with high commercial stakes, 
phage display is perhaps not an optimal technology. 
For the ordinary research user, however, it has the 
overwhelming advantage that it is cheap and easy. 
It uses standard microbiological techniques that are 
familiar to all molecular biologists, and its key 
resources— phage libraries and clones— are replicable 
and therefore nearly cost-free after their initial 
construction or selection. It is astonishing to con- 
template that within a single 1.5-mL microcentrifuge 
tube we can fit a few hundred trillion phage particles 
displaying billions of different peptide structures— an 
abundance and diversity from which hundreds of 
different users with altogether different purposes in 
mind can select clones of great value. And when that 
tube's supply has nearly run out, we have only to 
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propagate what is left to satisfy the needs of hun- 
dreds of additional penurious users. 
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